import vipy
assert vipy.version.is_at_least('1.9.5')
V = vipy.util.load('/Users/jba3139/Desktop/pip_175k/valset.pkl')
This dataset uses multi-label activities with dense bounding box annotations. Each object may be performing zero or more activities simultaneously, and the framewise labels capture when an object is performing an activity in a given frame. This means that a person can be simutaneously be performing two or more activities such as "person_talks_on_phone" and "person_opens_facility_door". This can also manifest due to the MEVA annotation definitions, which can introduce overlapping activities such as "vehicle_dropping_off" and "vehicle_stopping".
We recommend using the framewise labels to export tubelets of deforming bounding boxes over time for training. Examples of extracting labels and boxes from the toolchain are shown below.
v = V[1].mindim(512).load() # load a single video from the dataset
print(v) # Each video has useful information when printed
<vipy.video.scene: height=512, width=512, frames=223, color=rgb, filename="/Users/jba3139/Desktop/pip_175k/videos/car_drops_off_person/778838D5-92E0-49DB-B1C9-AEB05669D125-3542-00000294C2F31029_1.mp4", fps=30.0, category="car_drops_off_person", tracks=1, activities=3>
v[0].show() # show the first frame of video as an annotated image.
v[100].show() # 100th frame of video
v[200].show() # 200th frame of video
<vipy.image.scene: height=512, width=512, color=rgb, category="car_drops_off_person", objects=1>
im = v[100][0].show() # 100th frame, first object
im.crop().show() # crop this object using it's bounding box
<vipy.image.imagedetection: height=256, width=170, color=rgb, category="Vehicle", bbox=(xmin=0.0, ymin=0.0, width=170.0, height=256.0)>
im.boundingbox() # return the bounding box for this object
<vipy.object.detection: category="Vehicle", bbox=(xmin=0.0, ymin=0.0, width=170.0, height=256.0)>
im.boundingbox().json() # as JSON for portability
'{"_xmin":0,"_ymin":0,"_xmax":170,"_ymax":256,"_id":"f54a7b84fc48491b9cb28c1e607b9681","_label":"Vehicle","_shortlabel":"Vehicle","_confidence":null,"attributes":{"trackid":"f9991ee6ecac11ea82acac1f6b2c363c","activityid":["f9991bb2ecac11ea82acac1f6b2c363c"],"noun verb":[["Vehicle","dropping off"]]}}'
# make square, crop and resize to 224x224
im = v[100][0].boxmap(lambda bb: bb.dilate(1.2).maxsquare()).crop().mindim(224).show()
print(v.activitylabels(0)) # The labels in frame 0
print(v.activitylabels(100)) # in frame 100
print(v.activitylabels(200)) # in frame 200
lbls = [(k,lbl) for (k,lbl) in enumerate(v.label())] # (frame index, label set) tuples
{'car_stops', 'car_drops_off_person'}
{'car_drops_off_person'}
{'car_drops_off_person', 'car_starts'}
The use of joint activity labels means that activities can occur simulataneously. A single actor can be performing more than one activity at the same time, which means that a loss that assumes one-hot ground truth labels (e.g. categorical cross entropy) is an inappropriate choice for training. Instead, we recommend a framewise mutli-label loss that can be trained with multiple simulataneous labels per frame (e.g. binary cross-entropy).
We recommend running your proposal generation pipeline on these videos to output your own object tracks for encoding the clips for training. This will use the proper bounding box style for encoding tracks for representing activities.
For example, the following code will run an object detector on each frame of video, and compute the intersection of the returned object detections with the ground truth using a greedy bounding box assignment based on bounding box intersection over union. You can use the resulting annotated frame (e.g. imdet.objects()) as a replacement for the ground truth with your proposals.
from pycollector.detection import ObjectDetector
detect = ObjectDetector()
for im in V[3].mindim(512).stream():
im.show() # the original labeled image
imdet = detect(im).show() # the new detections
imdet.intersection(im, miniou=0.8, bycategory=False).show() # the best assignment of your detection to the truth (if any)
break
The MEVA annotation requirements includes class specific temporal padding which introduces up to five seconds of activity padding before and after activities occur. In order to be consistent with the MEVA annotation definitions, we have introduced the MEVA padding as a post processing step. However, this padding can introduce label error during training due to background frames mislabeled as the target class. our videos were collected with tight temporal boundaries as determined by the collectors when the videos were recorded. We recommend undoing the MEVA padding, and labeling the padded frames as a framewise label. Contact us at info@visym.com and we will provide you these precise framewise labels.
Our collection platform includes additional labels that can aid in your training. We subdivide broad classes into subclasses, which provide a more challenging task for training. For example, we break out the broad class "person_puts_down_object" into "person_puts_down_object_on_shelf", "person_puts_down_object_on_floor" and "person_puts_down_object_on_table". These are visually distinct activities, which can be rolled into a single class "person_puts_down_object", but we recommend using the sub-classes during training to reduce overfitting. Then, at test time, the original MEVA labels can be used.
Also, the collection platform includes additional weak labels that can aid in your training. These labels are stored as metadata for each video and include:
You can access this metadata as a dictionary or by casting a video to a Collector Video objects.
V[-1].metadata()
{'collection_id': 'b37f9ace-77ea-4659-8781-5e59176dcd25',
'video_id': 'E0E65439-B009-4215-9F95-BA19BC6A5CD8',
'ipAddress': '109.245.32.25',
'duration': 10,
'app_version': '1.0.22',
'os_version': '13.6.1',
'collection_name': 'Unload something from a rear door',
'program_name': 'MEVA',
'device_identifier': 'ios',
'subject_ids': ['10559b39-25a9-44bb-b9fa-44d6eb2e96ab'],
'device_type': 'iPhone6S',
'frame_rate': 29.974859795010634,
'frame_width': 1080,
'collected_date': '2020-08-23 06:36:46',
'collector_id': '2c4dd6fd-b71a-4850-a688-f3b7761835d4',
'blurred_faces': 0,
'project_name': 'MEVA Car',
'frame_height': 1920,
'project_id': '4c66a969-892c-4114-b711-45b2af02244a',
'orientation': 'portrait',
'rotate': None}
from pycollector.video import Video
v = Video.cast(V[-1])
print(v.geolocation())
print(v.uploaded())
{'ip': '109.245.32.25', 'host': '109.245.32.25', 'isp': 'Telenor d.o.o. Beograd', 'city': 'Belgrade', 'countrycode': 'RS', 'countryname': 'Serbia', 'latitude': '44.8166', 'longitude': '20.4721'}
2020-08-23 06:36:46-04:00
Our collections are organized to introduce diversity in the scene. For example, we specify to the collectors to load and unload both from a trunk and from a rear door of a vehicle to help introduce intra-class diversity for this class. Furthermore, we specify the style of some classes, such as "talk while fidgeting" to introduce additional intraclass variation into this class to reduce actor bias. We also separate out motorcycles and cars as separate activity classes. The full list of collection names are self explanatory and are available as follows.
print(set([v.metadata()['collection_name'] for v in V if 'collection_name' in v.metadata()]))
{'Put down a package on the floor and walk away', 'Greet a friend with a handshake while sitting', 'Unload something from a rear door', 'Get into a car using a rear door', 'Put down a package on the table and walk away', 'Come into this scene through a closed door while talking on a phone', 'Drive car turning left while stopping', 'Greet a friend with a hug then chat while standing', 'Sit and read a document', 'Sit and use laptop at table', 'Load something into a rear door', 'Get out of a car using a front door', 'Drive car turning left while starting', 'Steal something from your friend and walk away', 'Hold hands while walking and talking', 'Sit and use laptop on lap', 'Pick up a document from a table and read', 'Pick up an object from the floor while seated and put on the other side of you', 'Pick up an object from the floor and put it down on a neaby shelf', 'Pick up an object from a table and put it down on a high shelf', 'Drive car turning right while starting', 'Greet a friend with a hug then chat while walking ', 'Read and fidget', 'Greet a friend with a handshake then talk while walking', 'Ride a Bicycle', 'Drive car turning right while stopping', 'Drive car backwards', 'Pick up an object from a table and put it down on the floor', 'Drive car turning left ', 'Drop off passenger from car', 'Come into this scene through a closed door', 'Get into a car using a front door', 'Leave this scene through an opening while talking on a phone', 'Drive car backwards while turning left', 'Pick up an object from a shelf and put it down on a table', 'Leave this scene through a closed door', 'Walk while talking and texting on your phone', 'Talk on phone and fidget', 'Greet a friend with a handshake then talk while standing', 'Walk and talk', 'Drive car turning around', 'Carry a heavy object while walking and put it down on the floor', 'Get out of a car using a rear door', 'Talk while fidgeting', 'Come into a scene through an opening while talking on a phone', 'Use a laptop and fidget', 'Greet a sitting friend with a hug', 'Pick up passenger in car', 'Drive car turning right', 'Leave this scene through an opening', 'Hand something to your friend', 'Leave this scene through a closed door while talking on a phone', 'Pick up an object from the floor and put it down on a nearby table', 'Quickstart', 'Drive car backwards while turning right', 'Pick up, then walk and carry a heavy object', 'Come into a scene through an opening', 'Purchase something from a machine'}
Our pipelines support optical flow based stabilization of video. This removes the artfacts due to hand-held cameras to stabilize the background. Remaining artifacts are due to non-planar scenes, rolling shutter distortion and subpixel optical flow correspondence errors.
The pip-175k-stabilized release was constructed by running this stabilization on all videos and updating the object boxes accordingly. You can run this yourself as shown below, or use the public release. You can use the attribute "stabilize" to filter on the stabilziation residual to filter out those videos with too large a distortion.
d = vipy.util.groupbyasdict(V, lambda v: v.category())
v = d['person_carries_heavy_object'][0].mindim(256).stabilize()
v.frame(0).show()
v.frame(150).show()
print(v.getattribute('stabilize')) # the stabilization residual for filtering poorly stabilized videos
[vipy.flow.stabilize]: Affine coarse to fine stabilization ...
{'mean residual': 1.3057592780679719, 'median residual': 0.9582987531289169}
You can export torch or numpy arrays, or just transcode your videos for native ingestion into your pipeline at the appropriate frame size.
d = vipy.util.groupbyasdict(V, lambda v: v.category())
v = d['person_carries_heavy_object'][0]
v = v.crop(v.trackbox().dilate(1.5).maxsquare()).mindim(224).saveas('/tmp/out.mp4')
v.thumbnail(frame=0).show()
v.show(notebook=True)
[vipy.video.annotate]: Annotating video ...
v.torch().shape # export the transcoded video as a torch tensor
torch.Size([282, 3, 224, 224])
v.json() # Export the metadata as a JSON encoded string
/Users/jba3139/dev/vipy/vipy/video.py:423: UserWarning: JSON serialization of video requires flushed buffers, will not include the loaded video. Try store()/restore()/unstore() instead to serialize videos as standalone objects efficiently.
warnings.warn("JSON serialization of video requires flushed buffers, will not include the loaded video. Try store()/restore()/unstore() instead to serialize videos as standalone objects efficiently.")
'{"_filename":"\\/tmp\\/out.mp4","_url":null,"_framerate":30,"_array":null,"_colorspace":"rgb","attributes":{"blurred_faces":0,"collected_date":"2020-05-06 15:27:25","collection_id":"P004C006","collector_id":"533e5fb295","device_identifier":"android","device_type":"CPH1969","duration":16,"frame_height":1920,"frame_rate":30.0,"frame_width":1080,"orientation":"portrait","os_version":"28","project_id":"P004","subject_ids":["20200506_1527244575402164829910826"],"video_id":"20200506_1527244575402164829910826","rotate":null},"_startframe":null,"_endframe":null,"_endsec":null,"_startsec":null,"_ffmpeg":"ffmpeg -i \\/tmp\\/out.mp4 dummyfile","_category":"person_carries_heavy_object","_tracks":{"c3005754eb9011ea9217ac1f6b2c363c":{"_id":"c3005754eb9011ea9217ac1f6b2c363c","_label":"person","_shortlabel":"person","_framerate":null,"_interpolation":"linear","_boundary":"strict","attributes":{},"_keyframes":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281],"_keyboxes":[{"_xmin":65.93,"_ymin":37.92,"_xmax":158.88,"_ymax":186.65},{"_xmin":65.6,"_ymin":38.52,"_xmax":158.6,"_ymax":186.87},{"_xmin":65.58,"_ymin":38.48,"_xmax":158.84,"_ymax":186.85},{"_xmin":65.55,"_ymin":38.5,"_xmax":159.0,"_ymax":186.84},{"_xmin":65.84,"_ymin":38.81,"_xmax":159.71,"_ymax":187.54},{"_xmin":65.81,"_ymin":37.99,"_xmax":159.76,"_ymax":186.6},{"_xmin":65.78,"_ymin":38.14,"_xmax":159.76,"_ymax":186.6},{"_xmin":66.08,"_ymin":38.55,"_xmax":160.34,"_ymax":187.31},{"_xmin":65.13,"_ymin":38.76,"_xmax":159.34,"_ymax":187.32},{"_xmin":65.15,"_ymin":38.03,"_xmax":159.27,"_ymax":186.4},{"_xmin":65.5,"_ymin":38.48,"_xmax":159.82,"_ymax":187.12},{"_xmin":65.56,"_ymin":38.7,"_xmax":159.75,"_ymax":187.15},{"_xmin":65.97,"_ymin":38.2,"_xmax":160.33,"_ymax":186.95},{"_xmin":65.14,"_ymin":38.39,"_xmax":159.37,"_ymax":186.99},{"_xmin":65.3,"_ymin":38.55,"_xmax":159.39,"_ymax":187.04},{"_xmin":65.49,"_ymin":38.68,"_xmax":159.44,"_ymax":187.09},{"_xmin":65.71,"_ymin":37.85,"_xmax":159.54,"_ymax":186.21},{"_xmin":65.96,"_ymin":37.93,"_xmax":159.66,"_ymax":186.27},{"_xmin":65.29,"_ymin":37.98,"_xmax":158.86,"_ymax":186.35},{"_xmin":65.59,"_ymin":38.0,"_xmax":159.04,"_ymax":186.43},{"_xmin":65.9,"_ymin":38.0,"_xmax":159.24,"_ymax":186.51},{"_xmin":66.24,"_ymin":37.96,"_xmax":159.45,"_ymax":186.6},{"_xmin":66.25,"_ymin":38.6,"_xmax":159.06,"_ymax":186.93},{"_xmin":66.6,"_ymin":38.5,"_xmax":159.3,"_ymax":187.02},{"_xmin":66.01,"_ymin":38.37,"_xmax":158.61,"_ymax":187.12},{"_xmin":66.05,"_ymin":37.97,"_xmax":158.25,"_ymax":186.52},{"_xmin":66.08,"_ymin":38.48,"_xmax":157.9,"_ymax":186.86},{"_xmin":66.42,"_ymin":38.25,"_xmax":158.16,"_ymax":186.97},{"_xmin":67.38,"_ymin":38.68,"_xmax":158.73,"_ymax":187.31},{"_xmin":67.38,"_ymin":38.15,"_xmax":158.37,"_ymax":186.73},{"_xmin":67.37,"_ymin":38.52,"_xmax":158.0,"_ymax":187.07},{"_xmin":67.34,"_ymin":37.92,"_xmax":157.62,"_ymax":186.49},{"_xmin":67.29,"_ymin":38.21,"_xmax":157.22,"_ymax":186.84},{"_xmin":68.14,"_ymin":38.47,"_xmax":157.73,"_ymax":187.18},{"_xmin":67.71,"_ymin":38.45,"_xmax":156.7,"_ymax":186.84},{"_xmin":68.49,"_ymin":38.63,"_xmax":157.16,"_ymax":187.18},{"_xmin":68.33,"_ymin":38.77,"_xmax":156.68,"_ymax":187.52},{"_xmin":68.72,"_ymin":38.64,"_xmax":156.49,"_ymax":187.18},{"_xmin":69.06,"_ymin":38.47,"_xmax":156.28,"_ymax":186.84},{"_xmin":69.67,"_ymin":38.49,"_xmax":156.6,"_ymax":187.17},{"_xmin":69.03,"_ymin":38.25,"_xmax":155.42,"_ymax":186.83},{"_xmin":69.24,"_ymin":37.97,"_xmax":155.1,"_ymax":186.49},{"_xmin":70.27,"_ymin":38.54,"_xmax":155.62,"_ymax":187.03},{"_xmin":70.36,"_ymin":38.18,"_xmax":155.2,"_ymax":186.69},{"_xmin":70.37,"_ymin":38.66,"_xmax":154.75,"_ymax":187.22},{"_xmin":70.22,"_ymin":38.23,"_xmax":154.28,"_ymax":186.88},{"_xmin":70.65,"_ymin":38.64,"_xmax":154.65,"_ymax":187.41},{"_xmin":70.37,"_ymin":37.97,"_xmax":154.48,"_ymax":186.44},{"_xmin":69.97,"_ymin":38.34,"_xmax":154.87,"_ymax":186.98},{"_xmin":69.9,"_ymin":38.5,"_xmax":155.6,"_ymax":186.89},{"_xmin":69.37,"_ymin":38.03,"_xmax":156.08,"_ymax":186.61},{"_xmin":68.91,"_ymin":38.2,"_xmax":156.09,"_ymax":186.57},{"_xmin":68.48,"_ymin":38.59,"_xmax":155.89,"_ymax":187.15},{"_xmin":69.11,"_ymin":37.97,"_xmax":156.05,"_ymax":186.33},{"_xmin":69.63,"_ymin":38.4,"_xmax":155.98,"_ymax":186.96},{"_xmin":69.64,"_ymin":38.06,"_xmax":155.14,"_ymax":186.81},{"_xmin":70.38,"_ymin":38.36,"_xmax":154.61,"_ymax":186.89},{"_xmin":71.52,"_ymin":38.11,"_xmax":154.61,"_ymax":186.81},{"_xmin":71.6,"_ymin":38.51,"_xmax":153.29,"_ymax":186.97},{"_xmin":72.7,"_ymin":38.38,"_xmax":153.24,"_ymax":186.96},{"_xmin":72.9,"_ymin":38.3,"_xmax":152.36,"_ymax":187.0},{"_xmin":73.49,"_ymin":38.08,"_xmax":151.73,"_ymax":186.46},{"_xmin":73.52,"_ymin":38.05,"_xmax":150.79,"_ymax":186.5},{"_xmin":74.28,"_ymin":38.01,"_xmax":150.58,"_ymax":186.48},{"_xmin":75.02,"_ymin":37.92,"_xmax":150.33,"_ymax":186.4},{"_xmin":75.03,"_ymin":38.52,"_xmax":149.3,"_ymax":186.97},{"_xmin":75.84,"_ymin":38.26,"_xmax":149.0,"_ymax":186.65},{"_xmin":76.26,"_ymin":38.06,"_xmax":148.39,"_ymax":186.74},{"_xmin":77.25,"_ymin":38.33,"_xmax":148.05,"_ymax":186.85},{"_xmin":77.84,"_ymin":38.19,"_xmax":147.41,"_ymax":186.68},{"_xmin":77.99,"_ymin":38.61,"_xmax":146.46,"_ymax":187.07},{"_xmin":79.15,"_ymin":38.69,"_xmax":146.7,"_ymax":187.36},{"_xmin":79.2,"_ymin":38.6,"_xmax":145.72,"_ymax":187.26},{"_xmin":79.89,"_ymin":38.57,"_xmax":145.48,"_ymax":187.25},{"_xmin":80.42,"_ymin":37.94,"_xmax":145.24,"_ymax":186.51},{"_xmin":80.29,"_ymin":38.45,"_xmax":144.69,"_ymax":187.03},{"_xmin":80.7,"_ymin":38.3,"_xmax":144.9,"_ymax":186.74},{"_xmin":80.45,"_ymin":38.29,"_xmax":144.83,"_ymax":186.99},{"_xmin":80.45,"_ymin":38.66,"_xmax":145.07,"_ymax":187.41},{"_xmin":79.96,"_ymin":38.53,"_xmax":144.87,"_ymax":187.26},{"_xmin":79.85,"_ymin":38.01,"_xmax":144.99,"_ymax":186.57},{"_xmin":79.77,"_ymin":38.15,"_xmax":145.16,"_ymax":186.67},{"_xmin":79.84,"_ymin":38.34,"_xmax":145.36,"_ymax":186.84},{"_xmin":79.41,"_ymin":37.91,"_xmax":144.86,"_ymax":186.32},{"_xmin":80.24,"_ymin":38.6,"_xmax":145.61,"_ymax":187.18},{"_xmin":79.76,"_ymin":37.97,"_xmax":144.86,"_ymax":186.59},{"_xmin":80.15,"_ymin":38.27,"_xmax":144.86,"_ymax":186.79},{"_xmin":80.18,"_ymin":38.18,"_xmax":144.54,"_ymax":186.83},{"_xmin":81.0,"_ymin":38.19,"_xmax":144.93,"_ymax":186.88},{"_xmin":81.05,"_ymin":38.22,"_xmax":144.5,"_ymax":186.94},{"_xmin":80.76,"_ymin":38.0,"_xmax":143.54,"_ymax":186.41},{"_xmin":81.44,"_ymin":38.65,"_xmax":143.72,"_ymax":187.2},{"_xmin":81.74,"_ymin":38.27,"_xmax":143.38,"_ymax":186.65},{"_xmin":82.22,"_ymin":38.0,"_xmax":143.46,"_ymax":186.65},{"_xmin":82.27,"_ymin":38.22,"_xmax":143.03,"_ymax":186.82},{"_xmin":82.17,"_ymin":38.35,"_xmax":142.58,"_ymax":186.97},{"_xmin":82.66,"_ymin":38.39,"_xmax":142.86,"_ymax":187.11},{"_xmin":82.67,"_ymin":38.15,"_xmax":142.69,"_ymax":186.65},{"_xmin":82.78,"_ymin":38.76,"_xmax":142.97,"_ymax":187.5},{"_xmin":82.45,"_ymin":38.33,"_xmax":142.8,"_ymax":186.99},{"_xmin":82.02,"_ymin":38.57,"_xmax":142.67,"_ymax":187.22},{"_xmin":82.31,"_ymin":38.7,"_xmax":143.31,"_ymax":187.43},{"_xmin":81.56,"_ymin":38.54,"_xmax":142.8,"_ymax":187.04},{"_xmin":81.93,"_ymin":38.47,"_xmax":143.54,"_ymax":187.18},{"_xmin":82.12,"_ymin":38.1,"_xmax":143.89,"_ymax":186.74},{"_xmin":81.74,"_ymin":38.39,"_xmax":143.56,"_ymax":187.01},{"_xmin":81.51,"_ymin":38.58,"_xmax":143.3,"_ymax":187.26},{"_xmin":81.87,"_ymin":38.54,"_xmax":143.39,"_ymax":186.94},{"_xmin":81.9,"_ymin":37.93,"_xmax":143.24,"_ymax":186.43},{"_xmin":82.04,"_ymin":38.05,"_xmax":143.13,"_ymax":186.65},{"_xmin":82.26,"_ymin":38.19,"_xmax":143.05,"_ymax":186.88},{"_xmin":82.25,"_ymin":38.2,"_xmax":142.56,"_ymax":186.58},{"_xmin":82.59,"_ymin":38.46,"_xmax":142.51,"_ymax":186.85},{"_xmin":82.52,"_ymin":38.29,"_xmax":142.19,"_ymax":186.96},{"_xmin":82.9,"_ymin":38.06,"_xmax":142.15,"_ymax":186.58},{"_xmin":82.85,"_ymin":38.18,"_xmax":141.81,"_ymax":186.78},{"_xmin":83.48,"_ymin":38.47,"_xmax":142.17,"_ymax":187.02},{"_xmin":83.66,"_ymin":38.44,"_xmax":142.21,"_ymax":187.15},{"_xmin":83.79,"_ymin":38.65,"_xmax":142.21,"_ymax":187.35},{"_xmin":83.15,"_ymin":38.42,"_xmax":141.47,"_ymax":186.91},{"_xmin":83.41,"_ymin":37.96,"_xmax":141.8,"_ymax":186.38},{"_xmin":83.15,"_ymin":37.93,"_xmax":141.79,"_ymax":186.44},{"_xmin":82.8,"_ymin":38.64,"_xmax":141.72,"_ymax":187.2},{"_xmin":82.82,"_ymin":38.18,"_xmax":141.9,"_ymax":186.59},{"_xmin":82.78,"_ymin":38.57,"_xmax":142.04,"_ymax":187.19},{"_xmin":82.86,"_ymin":38.58,"_xmax":142.02,"_ymax":187.13},{"_xmin":82.93,"_ymin":38.29,"_xmax":141.97,"_ymax":186.94},{"_xmin":83.0,"_ymin":38.25,"_xmax":141.91,"_ymax":186.79},{"_xmin":83.38,"_ymin":38.13,"_xmax":142.27,"_ymax":186.71},{"_xmin":83.09,"_ymin":38.57,"_xmax":141.92,"_ymax":186.99},{"_xmin":83.14,"_ymin":38.03,"_xmax":142.01,"_ymax":186.59},{"_xmin":83.28,"_ymin":38.14,"_xmax":142.13,"_ymax":186.86},{"_xmin":83.22,"_ymin":38.35,"_xmax":141.84,"_ymax":187.05},{"_xmin":83.7,"_ymin":38.43,"_xmax":141.88,"_ymax":186.98},{"_xmin":83.6,"_ymin":38.05,"_xmax":141.25,"_ymax":186.64},{"_xmin":84.35,"_ymin":38.24,"_xmax":141.38,"_ymax":186.93},{"_xmin":84.26,"_ymin":38.39,"_xmax":140.44,"_ymax":186.8},{"_xmin":84.49,"_ymin":38.2,"_xmax":139.94,"_ymax":186.64},{"_xmin":85.35,"_ymin":38.2,"_xmax":140.12,"_ymax":186.65},{"_xmin":85.31,"_ymin":38.37,"_xmax":139.52,"_ymax":186.84},{"_xmin":85.7,"_ymin":38.04,"_xmax":139.51,"_ymax":186.53},{"_xmin":85.9,"_ymin":38.58,"_xmax":139.35,"_ymax":187.12},{"_xmin":86.01,"_ymin":38.6,"_xmax":138.99,"_ymax":187.24},{"_xmin":86.56,"_ymin":37.94,"_xmax":138.67,"_ymax":186.37},{"_xmin":86.84,"_ymin":38.28,"_xmax":137.83,"_ymax":186.94},{"_xmin":87.73,"_ymin":38.41,"_xmax":137.26,"_ymax":187.04},{"_xmin":88.86,"_ymin":38.28,"_xmax":136.91,"_ymax":187.0},{"_xmin":89.33,"_ymin":38.17,"_xmax":135.96,"_ymax":186.81},{"_xmin":90.12,"_ymin":38.15,"_xmax":135.64,"_ymax":186.85},{"_xmin":90.15,"_ymin":38.29,"_xmax":134.72,"_ymax":186.72},{"_xmin":90.78,"_ymin":38.09,"_xmax":134.77,"_ymax":186.74},{"_xmin":90.65,"_ymin":38.16,"_xmax":134.13,"_ymax":186.72},{"_xmin":90.68,"_ymin":38.01,"_xmax":133.8,"_ymax":186.53},{"_xmin":90.88,"_ymin":38.25,"_xmax":133.78,"_ymax":186.84},{"_xmin":90.97,"_ymin":37.99,"_xmax":133.7,"_ymax":186.5},{"_xmin":91.23,"_ymin":38.68,"_xmax":133.93,"_ymax":187.34},{"_xmin":91.1,"_ymin":37.93,"_xmax":133.73,"_ymax":186.4},{"_xmin":91.15,"_ymin":38.01,"_xmax":133.83,"_ymax":186.65},{"_xmin":91.47,"_ymin":38.54,"_xmax":134.17,"_ymax":187.13},{"_xmin":91.02,"_ymin":38.33,"_xmax":133.78,"_ymax":187.04},{"_xmin":91.53,"_ymin":38.65,"_xmax":134.29,"_ymax":187.36},{"_xmin":91.04,"_ymin":38.29,"_xmax":133.72,"_ymax":186.94},{"_xmin":91.56,"_ymin":38.06,"_xmax":134.05,"_ymax":186.63},{"_xmin":91.12,"_ymin":38.05,"_xmax":133.34,"_ymax":186.51},{"_xmin":91.35,"_ymin":38.29,"_xmax":133.26,"_ymax":186.84},{"_xmin":91.6,"_ymin":38.62,"_xmax":133.18,"_ymax":187.29},{"_xmin":91.91,"_ymin":38.29,"_xmax":133.09,"_ymax":186.94},{"_xmin":92.25,"_ymin":38.54,"_xmax":133.01,"_ymax":187.01},{"_xmin":92.55,"_ymin":38.42,"_xmax":133.04,"_ymax":187.11},{"_xmin":92.89,"_ymin":38.24,"_xmax":133.09,"_ymax":186.89},{"_xmin":92.91,"_ymin":38.16,"_xmax":132.89,"_ymax":186.8},{"_xmin":92.6,"_ymin":38.18,"_xmax":132.47,"_ymax":186.84},{"_xmin":92.31,"_ymin":38.14,"_xmax":132.09,"_ymax":186.56},{"_xmin":92.59,"_ymin":38.36,"_xmax":132.47,"_ymax":186.89},{"_xmin":92.25,"_ymin":38.52,"_xmax":132.29,"_ymax":186.95},{"_xmin":92.21,"_ymin":38.15,"_xmax":132.52,"_ymax":186.61},{"_xmin":92.46,"_ymin":38.51,"_xmax":133.2,"_ymax":187.19},{"_xmin":91.82,"_ymin":38.01,"_xmax":132.98,"_ymax":186.54},{"_xmin":92.14,"_ymin":38.25,"_xmax":133.85,"_ymax":186.94},{"_xmin":91.58,"_ymin":38.26,"_xmax":133.82,"_ymax":186.86},{"_xmin":91.05,"_ymin":38.21,"_xmax":133.9,"_ymax":186.83},{"_xmin":90.95,"_ymin":38.59,"_xmax":134.38,"_ymax":187.07},{"_xmin":90.87,"_ymin":38.22,"_xmax":134.96,"_ymax":186.8},{"_xmin":90.36,"_ymin":38.17,"_xmax":135.08,"_ymax":186.71},{"_xmin":90.15,"_ymin":38.48,"_xmax":135.57,"_ymax":187.13},{"_xmin":89.77,"_ymin":38.2,"_xmax":135.84,"_ymax":186.73},{"_xmin":89.37,"_ymin":38.12,"_xmax":136.14,"_ymax":186.58},{"_xmin":88.48,"_ymin":38.41,"_xmax":136.0,"_ymax":186.82},{"_xmin":88.32,"_ymin":38.17,"_xmax":136.72,"_ymax":186.81},{"_xmin":87.94,"_ymin":37.96,"_xmax":137.14,"_ymax":186.43},{"_xmin":87.45,"_ymin":38.31,"_xmax":137.58,"_ymax":186.83},{"_xmin":87.25,"_ymin":38.38,"_xmax":138.33,"_ymax":186.85},{"_xmin":86.9,"_ymin":38.26,"_xmax":139.04,"_ymax":186.97},{"_xmin":85.8,"_ymin":38.23,"_xmax":138.88,"_ymax":186.81},{"_xmin":85.52,"_ymin":38.38,"_xmax":139.52,"_ymax":186.84},{"_xmin":84.95,"_ymin":38.14,"_xmax":139.96,"_ymax":186.85},{"_xmin":84.84,"_ymin":38.45,"_xmax":140.7,"_ymax":187.15},{"_xmin":84.48,"_ymin":37.9,"_xmax":141.01,"_ymax":186.35},{"_xmin":83.73,"_ymin":38.19,"_xmax":141.0,"_ymax":186.81},{"_xmin":83.4,"_ymin":38.28,"_xmax":141.23,"_ymax":186.83},{"_xmin":83.08,"_ymin":38.36,"_xmax":141.43,"_ymax":186.91},{"_xmin":83.46,"_ymin":38.42,"_xmax":142.27,"_ymax":187.04},{"_xmin":82.9,"_ymin":38.3,"_xmax":142.0,"_ymax":186.7},{"_xmin":82.63,"_ymin":38.33,"_xmax":142.1,"_ymax":186.91},{"_xmin":82.79,"_ymin":38.18,"_xmax":142.44,"_ymax":186.63},{"_xmin":82.55,"_ymin":38.19,"_xmax":142.49,"_ymax":186.88},{"_xmin":82.74,"_ymin":38.02,"_xmax":142.79,"_ymax":186.63},{"_xmin":82.27,"_ymin":38.51,"_xmax":142.38,"_ymax":187.05},{"_xmin":82.49,"_ymin":38.32,"_xmax":142.63,"_ymax":186.8},{"_xmin":82.06,"_ymin":38.13,"_xmax":142.2,"_ymax":186.53},{"_xmin":82.59,"_ymin":38.1,"_xmax":142.81,"_ymax":186.74},{"_xmin":82.19,"_ymin":38.56,"_xmax":142.35,"_ymax":187.09},{"_xmin":82.75,"_ymin":38.52,"_xmax":142.94,"_ymax":187.24},{"_xmin":82.39,"_ymin":38.31,"_xmax":142.45,"_ymax":186.84},{"_xmin":82.31,"_ymin":38.27,"_xmax":142.35,"_ymax":186.92},{"_xmin":82.66,"_ymin":38.06,"_xmax":142.51,"_ymax":186.46},{"_xmin":82.63,"_ymin":38.02,"_xmax":142.4,"_ymax":186.48},{"_xmin":82.61,"_ymin":37.98,"_xmax":142.29,"_ymax":186.47},{"_xmin":82.61,"_ymin":37.95,"_xmax":142.17,"_ymax":186.44},{"_xmin":82.64,"_ymin":38.59,"_xmax":142.06,"_ymax":187.06},{"_xmin":82.69,"_ymin":38.56,"_xmax":141.94,"_ymax":187.0},{"_xmin":83.02,"_ymin":38.05,"_xmax":142.23,"_ymax":186.77},{"_xmin":83.11,"_ymin":38.04,"_xmax":142.13,"_ymax":186.69},{"_xmin":83.22,"_ymin":38.04,"_xmax":142.04,"_ymax":186.6},{"_xmin":83.36,"_ymin":38.05,"_xmax":141.95,"_ymax":186.51},{"_xmin":83.78,"_ymin":38.25,"_xmax":142.27,"_ymax":186.92},{"_xmin":83.3,"_ymin":38.28,"_xmax":141.54,"_ymax":186.84},{"_xmin":83.5,"_ymin":38.34,"_xmax":141.49,"_ymax":186.76},{"_xmin":83.99,"_ymin":38.57,"_xmax":141.86,"_ymax":187.19},{"_xmin":83.56,"_ymin":37.99,"_xmax":141.17,"_ymax":186.44},{"_xmin":84.08,"_ymin":38.25,"_xmax":141.56,"_ymax":186.88},{"_xmin":83.66,"_ymin":38.37,"_xmax":140.89,"_ymax":186.8},{"_xmin":84.18,"_ymin":38.01,"_xmax":141.3,"_ymax":186.56},{"_xmin":84.03,"_ymin":38.33,"_xmax":141.05,"_ymax":186.98},{"_xmin":84.54,"_ymin":38.68,"_xmax":141.48,"_ymax":187.4},{"_xmin":84.09,"_ymin":38.2,"_xmax":140.84,"_ymax":186.63},{"_xmin":83.89,"_ymin":38.58,"_xmax":140.61,"_ymax":187.03},{"_xmin":84.34,"_ymin":38.3,"_xmax":141.06,"_ymax":186.74},{"_xmin":84.09,"_ymin":38.01,"_xmax":140.84,"_ymax":186.43},{"_xmin":84.07,"_ymin":38.57,"_xmax":141.02,"_ymax":187.3},{"_xmin":84.42,"_ymin":38.27,"_xmax":141.49,"_ymax":186.95},{"_xmin":84.04,"_ymin":38.64,"_xmax":141.27,"_ymax":187.28},{"_xmin":83.61,"_ymin":38.3,"_xmax":141.05,"_ymax":186.89},{"_xmin":83.82,"_ymin":37.95,"_xmax":141.51,"_ymax":186.5},{"_xmin":83.28,"_ymin":38.4,"_xmax":141.29,"_ymax":186.95},{"_xmin":83.4,"_ymin":38.38,"_xmax":141.76,"_ymax":187.04},{"_xmin":83.2,"_ymin":38.57,"_xmax":141.8,"_ymax":187.12},{"_xmin":82.98,"_ymin":38.49,"_xmax":141.85,"_ymax":187.09},{"_xmin":83.43,"_ymin":38.68,"_xmax":142.58,"_ymax":187.36},{"_xmin":83.17,"_ymin":38.18,"_xmax":142.61,"_ymax":186.88},{"_xmin":82.91,"_ymin":38.14,"_xmax":142.63,"_ymax":186.83},{"_xmin":82.65,"_ymin":38.37,"_xmax":142.64,"_ymax":187.04},{"_xmin":82.4,"_ymin":38.2,"_xmax":142.64,"_ymax":186.86},{"_xmin":82.17,"_ymin":38.4,"_xmax":142.63,"_ymax":187.06},{"_xmin":82.65,"_ymin":38.36,"_xmax":143.3,"_ymax":187.03},{"_xmin":82.47,"_ymin":38.16,"_xmax":143.26,"_ymax":186.84},{"_xmin":82.33,"_ymin":38.57,"_xmax":143.21,"_ymax":187.28},{"_xmin":81.96,"_ymin":38.1,"_xmax":142.73,"_ymax":186.51},{"_xmin":81.91,"_ymin":38.56,"_xmax":142.65,"_ymax":187.01},{"_xmin":81.91,"_ymin":38.39,"_xmax":142.55,"_ymax":186.9},{"_xmin":82.65,"_ymin":38.28,"_xmax":143.14,"_ymax":186.86},{"_xmin":82.74,"_ymin":38.23,"_xmax":143.04,"_ymax":186.87},{"_xmin":82.86,"_ymin":38.22,"_xmax":142.93,"_ymax":186.92},{"_xmin":82.74,"_ymin":38.08,"_xmax":142.42,"_ymax":186.49},{"_xmin":82.92,"_ymin":38.15,"_xmax":142.33,"_ymax":186.6},{"_xmin":83.13,"_ymin":38.25,"_xmax":142.25,"_ymax":186.73},{"_xmin":83.36,"_ymin":38.37,"_xmax":142.19,"_ymax":186.86},{"_xmin":82.92,"_ymin":38.51,"_xmax":141.46,"_ymax":186.98},{"_xmin":83.17,"_ymin":37.96,"_xmax":141.45,"_ymax":186.39},{"_xmin":83.72,"_ymin":38.29,"_xmax":141.89,"_ymax":186.99},{"_xmin":83.3,"_ymin":38.44,"_xmax":141.25,"_ymax":187.03},{"_xmin":83.58,"_ymin":38.58,"_xmax":141.34,"_ymax":187.02},{"_xmin":83.44,"_ymin":38.18,"_xmax":141.2,"_ymax":186.78},{"_xmin":83.71,"_ymin":38.22,"_xmax":141.39,"_ymax":186.63},{"_xmin":83.36,"_ymin":38.56,"_xmax":140.93,"_ymax":186.99},{"_xmin":84.25,"_ymin":37.98,"_xmax":141.52,"_ymax":186.53},{"_xmin":84.09,"_ymin":38.12,"_xmax":140.65,"_ymax":186.69},{"_xmin":84.96,"_ymin":38.21,"_xmax":140.42,"_ymax":186.78},{"_xmin":85.3,"_ymin":38.1,"_xmax":139.47,"_ymax":186.49},{"_xmin":86.13,"_ymin":38.12,"_xmax":139.3,"_ymax":186.68},{"_xmin":86.29,"_ymin":38.12,"_xmax":138.83,"_ymax":186.64},{"_xmin":86.55,"_ymin":38.09,"_xmax":139.09,"_ymax":186.57}]}},"_activities":{"c30054e8eb9011ea9217ac1f6b2c363c":{"_id":"c30054e8eb9011ea9217ac1f6b2c363c","_startframe":0,"_endframe":285,"_framerate":30.0,"_label":"person_carries_heavy_object","_shortlabel":"carrying","_trackid":["c3005754eb9011ea9217ac1f6b2c363c"],"_actorid":null,"attributes":{"blurred_faces":0,"collected_date":"2020-05-06 15:27:25","collection_id":"P004C006","collector_id":"viraj.1892@gmail.com","device_identifier":"android","device_type":"CPH1969","duration":16,"frame_height":1920,"frame_rate":30.0,"frame_width":1080,"orientation":"portrait","os_version":"28","project_id":"P004","subject_ids":["20200506_1527244575402164829910826"],"video_id":"20200506_1527244575402164829910826","rotate":null}}}}'
We recommend that you introduce data augmentation in the form of scale jittering prior to training. The MEVA dataset includes many ultra-tiny people walking far from the camera. The PIP dataset does not include these tiny people, but the scale variation can be introduced by downsampling the crops to an appropriate resolution to best match the domain shift prior to training.
v.thumbnail(frame=2).show().mindim(32).mindim(256).show()
<vipy.image.scene: height=256, width=256, color=rgb, category="person_carries_heavy_object", objects=1>